cd/entity/Frontier Evalsยท homeโ€บ entitiesโ€บ Frontier Evals
grep -l @frontier evals /news/*.json | wc -l โ†’ 1

Frontier Evals

mentions 1 type Person feed RSS

// recent coverage 1 mentions

11:10
2026-06-24
byteiota.com
ai-research

SWE-bench Pro: How to Read the Coding Agent Leaderboard

OpenAI abandoned SWE-bench Verified on February 23, 2026, after finding 59.4% of its hardest failed tests were broken and training data contamination inflated scores. Its replacement, SWE-bench Pro frโ€ฆ

// co-occurs with top 7 entities